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Abstract 

The patterns of scientific collaboration have been frequently investigated in terms of complex networks without ref- 
erence to time evolution. In the present work, we derive collaborative networks (from the arXiv repository) pa- 
rameterized along time. By defining the concept of affine group, we identify several interesting trends in scientific 
collaboration, including the fact that the average size of the affine groups grows exponentially, while the number of 
authors increases as a power law. We were therefore able to identify, through extrapolation, the possible date when 
a single affine group is expected to emerge. Characteristic collaboration patterns were identified for each researcher, 
and their analysis revealed that larger affine groups tend to be less stable. 
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1. Introduction 



The progressive and inexorable informatization of scientific publishing has implied several important conse- 
quences, including the possibility to quantify and analyze the patterns characterizing scientific collaborations. For 



instance, many efforts have been dedicated to the identification of citations between articles (e.g. (|Amancio, Oliveira 



|Jr., & Costa] [20121 Amancio TNunes, Oliveira Jr., & Costal I20121 |Roth, Wu, & Lozano| [20121 |Amancio, Oliveira 
| Jr., & Costa] [bj |Persson 2010 Chen & Redner 2010|l). Another well-developed approach i nvolves mapping and 
studying collaborations between researchers (e.g. Liljero s, Edling, A maral, Stanley, & Aaberg, 2001 ; Shrum, Chom- 
palov, & Genuth||2001 Newman |2001||2~004| )). Such works are often done by using complex networks ( |Costa, Ro 



drigues, Travieso, & Villas Boas 2007 1. In the case of collaboration networks, each researcher is mapped as a node, 



while the joint authorships establish the links between those nodes. However, most such efforts disregards time, in the 
sense that the citation and collaborations are taken along long periods of time. By doing so, important information 
about transient patterns of collaboration are overlooked. For instance, some collaborations are more likely to follow 
an intermittent pattern, while others would be expected to proceed along continuous periods of time. 

The current work aims precisely at addressing this important issue, which has been accomplished by parame- 
terizing the collaboration networks explicitly along time. So, instead of a single network, we derive a sequence of 
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Figure 1: Example of growing of collaborative networks from t = to to t = to + 2 At. The toy database comprises 12 authors and 10 papers, (a) 
Collaborative network at t = to built from the following list of 7 papers and 9 authors (AX represents Author X): (i) paper 1 (Al, A2 and A3); (ii) 
paper 2 (Al and A2); (iii) paper 3 (Al, A2 and A4); (iv) paper 4 (A5 and A6); (v) paper 5 (A7, A8, A9 and A10); (vi) paper 6 (Al, A5 and A9); 
(vii) paper 7 (A6 and A8). (b) Collaborative network at t = to + At when paper 8 (A6 and A9) and paper 9 (A9, A4 and A10) are included. New 
edges are represented as dotted lines, (c) Collaborative network at t = to + 2At when paper 10 (A5, All and A12) is included. New nodes are 
represented as orange nodes. 



networks defined from a starting time up to the present moment (i.e. our networks are cumulative). For each node i in 
each of such parameterized networks, we define its respective affine group, corresponding to two sets of nodes. First, 
we identify those nodes that are directly attached to i, as they are co-authors. The second set of nodes corresponds to 
those that belong to the same community (Girvan & Newman 2002) as node i, and therefore represents those authors 
that are more closely interrelated. Having obtained the time-parameterized networks and the respective affine groups, 
we proceed to analyze the evolution of the latter along time. More specifically, we calculate the mean size of the affine 
groups along time for three different collaboration networks extracted from the arXiv repository (www.arXiv.org). 
Remarkably, we found that these sizes scale as an exponential with different exponents, while the number of authors 
in the respective networks grows slower, as a power law. We also found that different affine groups tend to exhibit 
rather distinct intermittence patterns, which suggested a classification of the authors according to their time-dependent 
collaboration patterns. So, for each author, we calculated the maximum size of the affine groups to which it belonged, 
as well as the average duration of the respective collaborations. These findings suggest that authors who collaborate 
with more people also tend to have shorter collaborations. 



2. Methodology 

2.1. The time-varying collaboration network 

The following procedure was applied in order to represent the relationship between authors in a specific topic. Let 
A = {a/j} be the matrix representing the undirected and unweighted network. If authors i and j collaborate on at least 
one paper from the database, then a link between them is established so that ay = 1 . Otherwise, fly = 0. Figure [T] 
serves as a gist of how the collaborative networks are constructed. Note that at every instant of time, new edges and 
new nodes might be included in the network. 



We built three collaboration networks using the arXiv repository. Each network was built based on papers about an 



specific topic. We adopted the criteria employed in ( 


Amancio, Oliveira Jr., & Costa 2012| Amancio, Nunes, Oliveira 


Jr., & Costa 2012| Amancio, Oliveira Jr., & Costa b 


I: given a keyword, we selected all papers in arXiv which contain 



this keyword in title or abstract. The keywords chosen were complex networks, graphene and topological insulator. 
For simplicity's sake we call the respective networks of COMPNET, GRAPHENE, and TOPINSU. These three topics 
have been chosen for they represent modern topics of current interest in the area of Physics. Specifically, one network 
was obtained for each year of the aforementioned networks and the evolution of collaborative groups of authors was 
studied in terms of the time-varying collaboration networks. Details regarding the networks are given in Table [T] 
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Table 1: Database and network statistics. P represents the set of papers, N represents the number of authors and m is the number of edges. A 
corresponds to the value of the parameter used to fit the exponential growth of the average size of affine groups. 



Network 


P 


N 


m 


A 


COMPNET 


1316 


2013 


5342 


0.56 


GRAPHENE 


4468 


6490 


24956 


1.09 


TOPINSU 


778 


1436 


5537 


1.87 
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Figure 2: Collaboration network COMPNET of 201 1 . Communities revealed by the algorithm are represented by different colors of nodes belonging 
to the giant component. Small components were disregarded. 



3. The affine group 



Here we define the main concept in this paper, i.e. the affine group. For each author i belonging to the set A of 
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authors we aim at identifying the subset of authors which are potentially interested in the same subject of research. 
The most natural choice of authors to belong to the affine group of i are the current or previous collaborators of 
i, i.e., the set Vj(f) = \ j e A | aij{t) > 0}. Obviously, authors possibly interested in the same subject may never 
have collaborated in the past. To consider this case we used the concept of community (Girvan & Newman 2002 ) 
in networks. A community is a subnetwork (i.e., a group of nodes) that is more densely connected internally than 
with the other nodes of the network. Formally, a subnetwork is a community if the number of triangles (set of three 
connected nodes) is more than k 2; £,(£,■ - 1), where k is an arbitrary constant (Seshadhri, Kolda, & Pinar 2012} and 
ki = 2 1 ay. This suggests that authors belonging to the same community on average have more affinity than authors 
belonging to different communities. This might also be inferred from the observation that different communities have 



particular properties in networks displaying both community structure and assortativity (Newman 2006 1. In order to 
include the definition of communities in the definition of the affine group, let C,(f) represent the set of nodes in the 
same community as i at the instant t. Thus, the affine group of a node i at instant f, denoted by G,(f) can be given 
by the set G,(0 = Vt(f) U C,-(f). To compute C,(f) we used the algorithm proposed by Newman ( |Newman} |2006| l (see 
Appendix A) to identify the communities for each collaborative network. This algorithm searches for the network 
partition which maximizes the modularity function Q, given as 



Q 2m 2 



an - 



2m 



5(ci, cj), 



(1) 



where m is the number of edges and (5(c,, cj) — 1 if nodes i and j belong to the same community and 6(cj, cj) = 
otherwise. We also assume that all the disjoint components with less than ten nodes correspond to a community. 
Therefore, the algorithm was applied only to the components with more than ten nodes. Figure [2] shows the partition 
of COMPNETin2011. 

With respect to this definition, the affine group of a researcher is the set of people who could be specially interested 
in your work, or from who you are mostly interested in. It is important to note that the affine group of a given node 
changes over time, following the progress of network topology. In order to illustrate this concept, in Figure[3]we show 
a synthetic network at two different time instants as well as the affine group of node i = 1. In Figure [3|a) one can 
see that at instant f, the network has 9 nodes and the affine group of node i = 1 is given by G\(f) = {2,3,4,5,6,7}. 
In Figure |3jb), we show the network at instant t + At, which has evolved as consequence of the addition of new 
authors and new collaborations performed during the time interval At. The affine group of node 1 is now given by 
G\(t + At) = {2,3,4,6,7, 11, 12). Therefore, we can observe that while node 5 no longer belongs to the affine group 
of node 1, two new nodes (11 and 12) joined this group. The dynamics of how the affine groups evolve along time 
is the main focus of this paper. It is intrinsically determined by how the authors interact and, consequently, how the 
communities present in this time-varying collaboration network merge to each other, or split themselves, creating new 
communities. 



4. Results and Discussion 

In Figure |4ja-d) we show the affine group of four different researchers of the network COMPNET as a function 
of time. The variable G,(f) corresponds to each column for different values of t. Yellow marks indicate that a given 
author belongs to the affine group of the reference node at that instant. The rows are sorted from the less active 
collaboration (top row) to the most active collaboration (bottom row). We see that, in general, the emerging patterns 
tend to be cumulative, in the sense that since a node j was in the affine group of node i in past, j keeps inside this 
group in future. As a consequence, we expected that the size of the groups of collaboration tends to increase along 
time. It is easy to see that the size of a affine group as a function of time, denoted by gi(t), is given by the sum of the 
column in Figure [4] 

We also studied the average values of g,-(f) averaged over all authors for networks, COMPNET, TOPINSU and 
GRAPHENE, as a function of time. As the time progresses, the groups increased with average behavior given by an 
exponential growth. Indeed, when plotting the same curve in monolog scale, as shown in FigureQe), we note that the 
average values are well fitted by the function (g(t)) = goe Al . FigureQe) also shows that the exponential behavior is not 
particular to COMPNET, since both GRAPHENE and TOPINSU also exhibit an exponential growth. In particular, 
the analysis of A reveals that the average growth of affine groups in the network is faster in the TOPINSU network. 
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G|(t) = {2,3,4,5,6,7} G^t+At) = {2,3,4,6,7,11,12} 

0l(t) = 6 9i(t+At) = 7 



Figure 3: Synthetic network where we illustrate the concept of afBne group, (a) The network at instant t has 9 nodes and the affine group of node 
1, denoted by Gi(f) is given by the set {2, 3, 4, 5, 6, 7|, whose size represented by g\(i) = 6. At the instant 1 + At, the network has 16 nodes and the 
affine group of node 1 is Gi(t + At) = (2, 3,4, 6, 7, 1 1, 12), whose size is g[ (t + At) = 7. 



Table 2: z-score of the probability that a new collaboration to be established within affine groups. The values are normalized by the probability 
expected if new connections were established just by chance. Note that almost all values turned out be positive, which confirms that new links are 
established preferentially within affine groups. 



Network 


2000 


2001 


2002 


2003 


2004 


2005 


2006 


2007 


2008 


2009 


2010 


2011 


COMPNET 










-0.31 


4.27 


10.43 


4.01 


9.80 


2.84 


6.30 


0.57 


TOPINSU 


















83.4 


31.3 


34.9 


25.6 


GRAPHENE 






















20.1 


29.0 



On the other hand, in the COMPNET community, the development of affine groups appears to be more limited and 
restricted than the other reasearch fields in Physics. The values of go and A which best fit the data for each network 
are shown in Table [Tj 

While the average size of the affine groups grows exponentially, we find out that the number of authors for the 
three studied networks grows only as power-law along time. This means that there is a tendency of emergence of 
a single, giant group. In other words, the connectivity patterns of the collaborative networks reveal the imminent 
emergence of a group of global collaboration. Interestingly, one can predict the emergence of such a group. In the 
case of COMPNET, there will be a unique affine group around the year 2020, as revealed in Figure [5] 

4.1. Probability for a new connection 

We also evaluated the probability P of a new connection at instant t + At link two nodes i and j, whichever of them 
was in the affine group of the other. In order to test the robustness of our findings, we report in Table |2]the values of 
P along time normalized by z-score. To compute the z-score, we counted the number e r of new random edges that 
were established inside the affine group. This procedure was repeated 100 times and the average (e r ) and standard 
deviation Ae r were computed. If e represents the real number of new edges established inside the same affine group, 
then the z-score is defined as: 

Z=^>. (2) 
Ae r 

Note that the normalized version of P given in equation [2] quantifies if the number of new links inside affine groups 
is greater than what would be expected just by chance. The values of Z are given in Table [2] With the exception 
of the network COMPNET in 2004, all observed values are positive. Remarkably, the z-score for TOPINSU and 
GRAPHENE are particularly high. This means that new collaborations are preferably established within collaborative 
groups. As such, this pattern could be used, for example, to suggest future collaborations as authors belonging to the 
same affine group probably shared the same research interests. 
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Figure 4: (a-d) Evolution of the affine group of four different authors of the network COMPNET. (e) Evolution of the average size of the affine 
group for the three collaboration networks considered in this paper. It is important to observe that we also tried to fit the data present in this figure 
by considering other functions, such as square and cubic polynomial, power-law, and exponential. The exponential fitting presented the minimum 
Xi-square error. 
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Figure 5: Exponential fitting for the average size of the affine group and power-law fitting for the number of authors in the COMPNET (N(t) ~ t ). 
The two curves intersect when t == 20. 



4.2. Authors classification 

As it was shown in Figure [4] different authors tend to have different patterns of collaboration. For instance, the 
author shown in |4ja) has a kind of intermittent collaborations, which is the opposite of that observed for the author 
in[4jd). In order to investigate what these patterns tell us about the scientific community, we extracted the following 
features from the affine group of each author: i) g max - the maximum size of the affine group along the years, and ii) 
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s - the average size of the period in which authors were in the same affine group. Figure [6] shows how the authors 
are distributed according to these two attributes. It is clear from the overall distribution of authors in this figure that 
authors that tend to participate in larger affine groups also have shorter collaboration periods. On the other hand, 
authors in the region A of Figure |4] have small groups, but these groups last for long periods of time. Given that the 
nodes are colored according to their degree, it also follows from this figure that the previous result is not affected by 
the number of co-authors. 
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Figure 6: Scatter plot between variables g max - size of affine group and s - average duration of collaboration. Nodes are colored according to the 
author degree in the network shown in Figure|2] 



5. Conclusions 



The problem of scientific collaboration has been addressed by a large number of works in terms of complex 
networks. However, the effect of time has not been usually taken into account. Actually, the study of time varying 
models of networks has been restricted to a few works ( |Perra, Gongalves, Pa stor-Sato rras, & Vespignanij 2012| l. In 
the present article, we obtained sequences of networks from arXiv parameterized in terms of time, which allowed 
us to investigate the evolution of collaboration patterns. This was accomplished by defining the affine group of each 
author and then taking respective measurements regarding the number and duration of pairwise collaborations. Several 
interesting results have been obtained. First, we have that the size of the affine groups follows an exponential law, 
while the total number of authors grows as power law. This implies that a single affine group will eventually emerge. 
By using extrapolation, we were capable of predicting the date of such an event. Another interesting finding was that 
researchers tend to exhibit different patterns of collaborations as far as the intermittency is concerned. We mapped 
these patterns into a 2D space by using the size of the affine groups and the average duration of the collaborations. 
It followed from this result that authors that belong to large affine groups tend to have shorter collaboration periods. 
Another interesting finding is the fact that authors tend to collaborate mainly with colleagues belonging to their 
affine group, probably because they share the similar research interests. Further works could take into account the 
intensity of the collaborations (weighted complex networks) as well as investigate the dynamics of migration of 
authors between groups. Another possibility is to introduce a decay factor for edges weights so that old collaborations 
are straightforwardly disregarded from the analysis of collaboration patterns. 
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Appendix A. Community detection in complex networks 



A feature shared by many real systems modeled as complex networks is the presence of community structure (Gir- 
van & Newman 20021. Nodes of clustered networks tend to organize into groups with many internal connections 



(above than what would be expected just by chance) and a few external links to the other communities (Newman 



2006). Social and information networks (see e.g. (Liljeros, Edling, Amaral, Stanley, & Aaberg 2001 Chen & Red- 
ner| |20 1 0] |Ding| [20 1 1 } ) are some examples of networks displaying this type of organization. A large number of recent 
results suggests that real complex networks may display local properties that are very distinct from the global proper- 
ties of the entire network, so that the focus on the network as a whole without considering the community structure 
may overlook many interesting features of the modeled system ( |Newmari] |2006| l . 



In the current paper, the method employed to detect communities is based on spectral decomposition (Mieghem 



2001 ). For simplicity's sake, let us consider the case where the network comprises two communities. Let R be the 



number of edges inter communities, given by: 



R 



(A.l) 



where G, + Gj indicates that the sum is considered only if nodes i and j are placed in different communities. The 
membership of each node is stored in vector"? comprising n elements. If node z b elong to community G\, then s, = 
1. Otherwise, s,- = -1 and i belongs to Gi- Note that, if we introduce"? in Equation . 



A.l 



it can be rewritten as 



R 



(A.2) 



Computing the local degree of z as 



the total degree of the network is given by 



(A.3) 



(A.4) 



Replacing equation A.4 in equation A.2 one obtains 



(A.5) 



Because R represents the number of edges inter-communities, the objective is to minimize this quantity. To do so, 
we first expand the Laplacian matrix L as a linear combination of the eigenvectors "v; : 



(A.6) 



where A = A\ < A2 . . . < A„ are the eigenvalues associated with"v;. In order to minimize R, one needs to associate the 
highest coefficients a, to the lowest eigenvalues Aj. Thus, the objective is to position"? parallel to ~v\. Unfortunately, if 
one followed this optimization strategy a trivial solution would be obtained, since all nodes would belong to a single, 
giant community. For this reason, rather than vf , the eigenvector V2 is chosen to set the direction onto ~? is projected. 
To minimize the amount of inter communities edges, the best heuristic is to set s, = 1 whenever the z-th element of V2 
is negative. Analogously, s, should be set to s, = -1 whenever the z'-th element of v| is positive, minimizing the inner 
product between ~~$ and vj. 



To exemplify how the algorithm works, we detected two communities in a social network depicted in Figure A.7 



As the figure reveals, two partitions were consistently identified. The method employed by ( Newman] 2006|l is very 
similar to the one described here. Rather than minimizing the number of edges inter communities, ( |Newman| [2006 ) 
maximizes the number of edges intra-communities that are above the expected just by chance using the modularity 
function. As well, they provide methods to identify the potential presence of more than two communities. 
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Figure A.7: Example of partition obtained with the method based on the spectral decomposition of the dolphin social network (Lusseau, Schnei-| 
der, Boisseau, Haase, Slooten, & Dawson 2003 1. As expected, each community comprises many intra community edges and a few inter community 
edges. 
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